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(57)Abstract: 

PROBLEM TO BE SOLVED: To extract a specific 
document block from image data read in from a copy of 
a newspaper, a magazine and the like, and produce 
document data that can be easily read and be 
economically and efficiently pasted to a region of a fixed 
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size. 

SOLUTION: A document block is extracted from 
captured image data. Character codes are recognized in 
a character image in the document block. Rectangular 
vector data are prepared wherein the shape of the 
document block is reconstructed (S202). In the 
rectangular vector data, character code data 
corresponding to the recognized character codes are laid 
out (S214, S215). 
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* NOTICES * 



JPO and NCIPI are not responsible for any 
damages caused by the use of this translation. 

1. This document has been translated by computer. So the translation may not reflect the original 
precisely. 

2. **** shows the word which can not be translated. 
3. In the drawings, any words are not translated. 



DETAILED DESCRIPTION 



[Detailed Description of the Invention] 
[0001] 

[Field of the Invention] This invention relates to what extracts a required document block and obtains 
predetermined document data in more detail out of the image data which read manuscripts, such as a 
newspaper and a magazine, and was obtained about the record medium which recorded the image 
processing system, the image-processing approach, the image-processing program, and the image- 
processing program and in which computer reading is possible. 
[0002] 

Pescription of the Prior Art] There is a case where he wants to extract only a specific document and to 
acquire as data in the manuscript which consists of the whole space (a newspaper, a magazine, etc. and 1 
page). 

[0003] For example, after reading manuscripts, such as a newspaper and a magazine, and obtaining 
image data, the equipment which associates and memorizes the character code data of the header 
obtained by cutting down the alphabetic character image of the header in image data, and carrying out 
character recognition processing and the alphabetic character image data of the text corresponding to the 
header concerned is proposed by JP,9-20451 1,A. 
[0004] 

[Problem(s) to be Solved by the Invention] however, the alphabetic character image data of the text of 
what can obtain the alphabetic character image data of the text which is the related article when the 
equipment of a publication specifies the character code data of a header as the above-mentioned official 
report - the layout of a manuscript - since it remained as it is, there was a problem of the configuration 
of a document block (document field) where the obtained document data exist having been irregular, and 
being hard to read. And when sticking the obtained document data on the field of fixed form size, since 
the configuration is irregular, many margin parts are made and are inefficient-like. 
[0005] It is extracting a specific document block, being easy to read, and obtaining the document data 
which can be stuck efficiently [ there is no futility in the field of fixed form size, and ] moreover out of 
the image data which it was made in order that this invention might solve an above-mentioned technical 
problem, and the purpose read manuscripts, such as a newspaper and a magazine, and was obtained 
[0006] 

[Means for Solving the Problem] The purpose of this invention is attained by the means which carries 
out the following. 

[0007] (1) The image processing system characterized by to have an extract means extract the document 
block with which a predetermined image exists out of the image data which should be processed, a 
recognition means recognize a character code from the alphabetic character image within said document 
block, a reconstruction means reconfigurate said document block in a predetermined configuration, and 
a layout means arrange the character code data according to the character code recognized by said 
recognition means in said reconfigurated document block. 

[0008] (2) It is an image processing system given in the above (1) characterized by for said extract 
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means extracting two or more document blocks, and packing into one two or more document blocks 
with which said reconstruction means was extracted, and reconfigurating them in a predetermined 
configuration. 

[0009] (3) Said predetermined image is an image processing system given in the above (1) characterized 
by including the alphabetic character image of the text corresponding to the alphabetic character image 
of a header, and the header concerned. 

[0010] (4) An image processing system given in the above (3) characterized by having further a 

reference letter arrangement means to arrange the character code data corresponding to the alphabetic 

character image of said header to the position within the reconfigurated document block. 

[001 1] (5) Said reconstruction means is an image processing system given in the above (1) characterized 

by adjusting the length or form width of a document block to one step of die length twice [ abbreviation 

natural number ] the die length of the column formed in the document block concerned. 

[0012] (6) An image processing system given in the above (1) characterized by having further a file 

creation means to create the electronic file which stored the character code data arranged by said layout 

means. 

[0013] (7) An image processing system given in the above (1) characterized by having further a printing 

means to print the character code data arranged by said layout means to record material. 

[0014] (8) An image processing system given in the above (1) characterized by reading the image of a 

manuscript optically and having farther a reading means to obtain said image data which should be 

processed. 

[0015] (9) The image-processing approach characterized by to have the step which extracts the 
document block with which a predetermined image exists out of the image data which should be 
processed, the step which recognizes a character code from the alphabetic character image within said 
document block, the step which reconfigurates said document block in a predetermined configuration, 
and the step which arrange the character code data according to said recognized character code in said 
reconfigurated document block. 

[0016] (10) The image-processing program for considering as a means arrange the character code data 
according to said recognized character code, and making it function in a means extract the document 
block with which a predetermined image exists out of the image data which should process a computer, 
a means recognize a character code from the alphabetic character image within said document block, a 
means reconfigurate said document block in a predetermined configuration, and said document block 
that were reconfigurated. 

[0017] (1 1) The record medium which recorded the program of a publication on the above (10) and in 

which computer reading is possible. 

[0018] 

[Embodiment of the Invention] Hereafter, the operation gestalt of this invention is explained with 
reference to the attached drawing. 

[0019] Drawing 1 is the block diagram showing the outline configuration of the image processing 
system concerning 1 operation gestalt of this invention. 

[0020] An image processing system 100 reads manuscripts, such as a newspaper and a magazine, 
extracts required document data out of the obtained image data, and has the filing function saved as an 
electronic file. 

[0021] This image processing system 100 has CPU1 10, ROM120, RAM130, a control unit 140, a hard 

disk 150, the archive-medium drive 160, ASIC 170, and the scanner engine 180. 

[0022] CPU1 10 controls the whole image processing system 100 according to a program. 

[0023] ROM 120 stores a control program and data. In addition, the image-processing program 

mentioned later is stored in ROM120. 

[0024] RAMI 30 is equipped with the field which has the field which memorizes data and a program 
temporarily, for example, memorizes temporarily the document image data within the document block 
in image data (document field). 

[0025] Although a control unit 140 is not illustrated, it has the touch panel display, the input key, etc. 
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[0026] A hard disk 150 can memorize an operating system and various kinds of application programs. 
Moreover, this hard disk 150 can memorize the created electronic file. 

[0027] Reading and the archive-medium drive 160 can write various data, such as an electronic file 
created by various removable type archive media (for example, a flexible disk, an MO disk, etc.). 
[0028] ASIC 170 is equipped with the mark detection section 171, the image-processing section 172, and 
the field distinction section 173. The marie detection section 171 detects the mark which shows the 
location of the specific document block in image data. The image-processing section 172 performs 
predetermined image processings, such as noise rejection, to the document image data within the 
document block concerned. Moreover, the field distinction section 173 creates alphabetic character 
image data and photograph image data from document image data. 
[0029] The scanner engine 180 can obtain image data by reading a manuscript. 

[0030] Next, a document, an alphabetic character, and the procedure of photograph image data creation 
processing are explained using the flow chart of drawing 2 . In addition, the contents of the flow chart 
shown in drawing 2 are memorized by ROM 120 as a program, and are performed by CPU1 10. 
[0031] First, press can initiation of a manuscript is directed at step S101. Thereby, the scanner engine 
180 reads optically the manuscript which consists of the 1-page whole space, such as a newspaper and a 
magazine. In addition, the scanner engine 180 can perform the press can which reads a manuscript 
coarsely, and this scan read finely. The press can image data outputted from the scanner engine 180 is 
inputted into the mark detection section 171. 

[0032] At step SI 02, a judgment whether the mark was detected in press can image data is made. Here, 
the mark detection section 171 detects the mark 12 which consists of thick wire frames in the press can 
image data 10, as shown in drawing 4 . And when a mark is not detected (step S102: NO), processing of 
step S103 is performed, and when a mark is detected (step S102: YES), processing of step S104 is 
performed. In addition, the above-mentioned mark is beforehand added to a manuscript by the user 
using the marker of a felt pen etc. A user does marking of the field which includes a header and the text 
corresponding to this at least. 

[0033] The screen urged to give a mark to the touch panel display of a control unit 140 to a user at a 
manuscript is expressed as step S103. In this case, a user redoes extract processing of a document block 
again, after giving a mark to a manuscript. 

[0034] On the other hand, at step SI 04, the coordinate value of each top-most vertices located in the 
angle thru/or corner of the document block 14 surrounded by the mark 12 detected by the mark detection 
section 171 is transmitted to the scanner engine 180. That is, with the gestalt of this operation, it is 
extracted as document block 14 which the field which the user specified by marking should process. 
However, the extract approach of the document block 14 is not limited to this. For example, it detects 
according to the technique of common knowledge of the alphabetic character image equivalent to a 
header, the alphabetic character image equivalent to the text, the image equivalent to a ruled line, etc., 
and you may make it extract automatically the field which includes a header and the text corresponding 
to it at least as document block 14 based on the detected image. 

[0035] This scanning initiation of a manuscript is directed at step S105. Here, the scanner engine 180 is 
set as the range of this scan of the range specified by the coordinate value of each top-most vertices fed 
back by the mark detection section 171, and performs this scan. Thereby, the document block 14 is 
extracted. This scanning image data within the document block 14 outputted from the scanner engine 
180 is inputted into the image-processing section 172. 

[0036] At step SI 06, the various image processings by the image-processing section 172 are performed. 
To this scanning image data, the image-processing section 172 performs image processings, such as 
noise rejection, inclination amendment, in every direction detection, and alphabetic character emphasis, 
and, specifically, obtains the document image data 16 (refer to drawing 5 ). The document image data 16 
outputted from the image-processing section 172 is inputted into the field distinction section 173. 
[0037] At step SI 07, field distinction processing by the field distinction section 173 is performed. 
Specifically, the field distinction section 173 distinguishes the alphabetic character image section 18 to 
which an alphabetic character image exists in document image data, and the photograph section 20 in 



http://www4.ipdl.ncipi.go.jp/cgi-bin/tran_web_cgi_ejje 9/14/06 



which a photograph (a pattern is included) exists. Since this field distinction approach is a well-known 
technique, it omits that detailed explanation. Moreover, the field distinction section 173 extracts the 
alphabetic character image section 18, creates the alphabetic character image data 22 (refer to drawing 
6 ), extracts the photograph section 20, and creates the photograph image data 24 (refer to drawing 7 ). 
[0038] At step S108, the document image data 16 outputted from the field distinction section 173, the 
alphabetic character image data 22, and the photograph image data 24 are inputted into RAM 130, and 
are memorized. However, when the photograph image data 24 does not exist, only the document image 
data 16 is memorized by RAM130. 

[0039] Next, the procedure of a document block and reconstruction processing of character code data is 
explained using the flow chart of drawing 3 . In addition, the contents of the flow chart shown in 
drawing 3 are memorized by ROM120 as a program, and are performed by CPU1 10. 
[0040] At step S201, the area of the document block 14 with which the document image data 16 is 
arranged is calculated. The area of the document block 14 is called for by specifically calculating the 
area of the document image data 16 from the total number of dots of the document image data 16 
memorized by RAMI 30, and the resolution (dpi) of the scanner engine 180 set up beforehand. In 
addition, the area of the document block 14 is also calculable from the coordinate value of each top-most 
vertices located in the angle thru/or corner of the document block 14. 

[0041] The document block 14 is reconfigurated by the predetermined configuration at step S202. It has 
the same area as the document block 14, and, specifically, the rectangle vector data 26 (refer to drawing 
8 ) of a predetermined aspect ratio are created as a reconfigurated document block. The form width A 
for example, as A4 size with the same predetermined aspect ratio: It is set as the vertical dimension B= 
210:297. In addition, the predetermined configuration reconfigurated is not limited to a rectangle, and 
the configuration of arbitration can be used for it if it is a setded readable configuration. The created 
rectangle vector data 26 are memorized by RAM130. 

[0042] At step S203, to the alphabetic character image data (document image data) memorized by 
RAM130, character recognition processing is performed and character code data are obtained. A font 
size is contained in character code data. In addition, character recognition processing can also be 
performed before creation of rectangle vector data. 

[0043] At step S204, distinction of a header and the text is performed into character code data. Here, 
after computing the frequency of occurrence for every font size of character code data, the thing with a 
larger font size which has the frequency of occurrence lower than a predetermined value than a 
predetermined value is distinguished from the character code data of a header, and the thing with a font 
size smaller than a predetermined value which has the frequency of occurrence higher than a 
predetermined value is distinguished from the character code data of the text. However, a header and the 
distinction approach of the text are not limited to the above-mentioned approach. 
[0044] in addition, it is located in the sentence end of the character code data of the text in the case of 
character recognition processing — M . — " — the case where one or more tooth spaces continue behind — 
" . -- 11 - a formal paragraph may be distinguished by attaching a carriage return sign immediately after. 
[0045] At step S205, one step of height of the column formed in the document block 14 is calculated 
(when the text is columnar writing). In addition, also in lateral writing, this invention is applicable 
although the case where the text is columnar writing here is explained. One step of multicolumn height 
is computed from the font size of the obtained character code data, and the multicolumn number of 
alphabetic characters per step, for example. 

[0046] At step S206, the vertical dimension B of the rectangle vector data 26 is amended to one step of 
C of a column 28 twice [ abbreviation natural number ] the die length of height (refer to drawing 9 ). 
Thereby, the rectangle vector data 26 without a useless margin part can be obtained, maintaining one 
step of height of the column in a manuscript. The vertical dimension after correction is amended to the 
value nearest to the vertical dimension of the first rectangle vector data 26 shown in drawing 8 at this 
time. Moreover, in order to maintain the area of the first rectangle vector data 26, form width A is 
amended according to correction of the vertical dimension B. The rectangle vector data 26 after 
correction are again memorized by RAM130. 
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[0047] At step S207, it is judged whether it can arrange in the rectangle vector data 26 after the character 
code data of a header correcting. Specifically, the form width A of the rectangle vector data 26 after 
correction and the vertical dimension B are compared with the oblong of the character code data of a 
header, and longwise, respectively. When the character code data of a header overflow the rectangle 
vector data 26 (step S207; NO), processing of step S208 is performed, and when it can arrange in the 
rectangle vector data 26 (step S207: YES), processing of step S209 is performed. 
[0048] At step S208, according to the magnitude of the character code data of a header, the rectangle 
vector data 26 are corrected again and are again memorized by RAM130. Here, only the dimension of 
the rectangle vector data 26 of the direction (length or width) where the character code data of a header 
overflow is expanded. 

[0049] On the other hand, at step S209, it is judged whether the photograph image data 24 is memorized 
by RAMI 30. When the photograph image data 24 exists (step S209: YES), processing of step S210 is 
performed, and when it does not exist (step S209: NO), processing of step S214 is performed. 
[0050] At step S210, it is judged whether it can arrange in the rectangle vector data 26 after the 
photograph section 20 in the photograph image data 24 correcting. Specifically, the form width A of the 
rectangle vector data 26 after correction and the vertical dimension B are compared with the oblong of 
the photograph section 20, and longwise, respectively. When the photograph section 20 overflows the 
rectangle vector data 26 (step S210: NO), processing of step S21 1 is performed, and when it can arrange 
in the rectangle vector data 26 (step S210: YES), processing of step S212 is performed. 
[0051] At step S21 1, according to the magnitude of the photograph section 20, the rectangle vector data 
26 are corrected again and are again memorized by RAM130. Here, only the dimension of the rectangle 
vector data 26 of the direction (length or width) where the photograph section 20 overflows is expanded. 

[0052] On the other hand, at step S212, the location of the photograph section 20 in the photograph 
image data 24 is measured. Specifically, the coordinate (distance r, Direction theta) of the rightmost top 
location of the photograph section 20 is computed on the basis of the rightmost top location (it is the 
same as the point at the upper right of the document image data 16) of the photograph image data 24 
(refer to drawing 7 ). 

[0053] At step S213, as shown in drawing 10 , the photograph section 20 is first arranged in the 
rectangle vector data 26 according to the measured above-mentioned location. Thereby, the layout of a 
manuscript is maintained to some extent 

[0054] At step S214, the character code data 30 of a header are arranged at the upper right at the position 
in the rectangle vector data 26, and a concrete target. Thereby, readability improves. In addition, when 
the text is lateral writing, the character code data 30 of a header are arranged at the upper left. 
[0055] At step S215, the character code data 32 of the text are arranged in the rectangle vector data 26. 
Here, the character code data 32 of the text are arranged in order from the upper right to lower left one at 
the margin part corresponding to columns 28 other than the part by which the photograph section 20 and 
the character code data 30 of a header are arranged in the rectangle vector data 26. 
[0056] At step S216, the character code data 30 of a header, the character code data 32 of the text, and 
the document data 34 (refer to drawing 1 1 ) with which the photograph section 20 was arranged and 
completed in the rectangle vector data 26 are saved as an electronic file at a hard disk 150. In addition, 
this document data 34 may be saved by the archive-medium drive 160 at a flexible disk etc. 
[0057] The document block 14 is extracted out of the image data which was obtained as mentioned 
above by reading the image of manuscripts, such as a newspaper and a magazine, optically using the 
scanner engine 180 according to this operation gestalt. A character code is recognized from the 
alphabetic character image within the document block 14, the rectangle vector data 26 which 
reconfigurated the configuration of the document block 14 are created, and the character code data 
according to the character code recognized in this rectangle vector data 26 are arranged. Therefore, out 
of the image data which read manuscripts, such as a newspaper and a magazine, and was obtained, 
specific document data are extracted, it is easy to read, and the document data which can be stuck can be 
obtained easily efficiently [ there is no futility in the file space of fixed form size, and ] moreover, for 
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example. 

[0058] This invention is not limited only to the above-mentioned operation gestalt, and can be variously 
changed within the limits of an application for patent. 

[0059] For example, although the above-mentioned operation gestalt gave and explained the example in 
case the document block 14 surrounded by the mark 12 in the press can image data 10 which read the 
manuscript is one, this invention is not restricted to this. As shown in drawing 12 (A), this invention can 
be applied also when two or more document blocks exist. In this case, the area of a document block is 
called for as total of the area of each document block 14. And as shown in drawing 12 (B), the character 
code data 32 grade of the character code data 30 of a header or the text is arranged one by one for every 
document block from the upper right for example, in the rectangle vector data 26 to the lower left at a 
margin part, (when the text is columnar writing). However, especially this configuration method is not 
limited. According to this example, even when a required document distributes in a manuscript and 
exist, it is collected into one, and the document data which can be stuck can be obtained easily that it is 
readable and more efficiently. [ two or more ] 

[0060] Moreover, although the range of the document block 14 was specified by the mark 12 and 
extracted with the above-mentioned operation gestalt, it is also possible to extract all document blocks 
from the image data which read the manuscript which consists of the 1-page whole space, such as a 
newspaper and a magazine, automatically. 

[0061] Moreover, the image processing system may have the interface for delivering and receiving data 
among [ other than a configuration of having been shown in drawing 1 ] other information machines and 
equipment. Thereby, the created document data 34 (refer to drawing 1 1 ) can be transmitted to other 
information machines and equipment, such as a computer and a printer. 

[0062] Moreover, the image processing system may have the printer engine which prints the data other 
than a configuration of having been shown in drawing 1 to record material, such as a cut sheet, and a 
sheet for OHP, a roll sheet. Thereby, the created document data 34 (refer to drawing 1 1 ) can be printed 
to record material. 

[0063] In addition, each means to constitute the image processing system by this invention, and the 
image-processing approach can be realized by the hardware circuitry or the programmed computer of 
dedication. Moreover, when the programmed computer realizes this invention, the program which 
operates a -computer can also be offered by the record media (for example, a floppy (trademark) disk, 
CD-ROM, etc.) in which computer reading is possible. In this case, the program currently recorded on 
the record medium in which computer reading is possible is usually transmitted to a hard disk, and is 
memorized. Moreover, this program may be independently offered as application software, and you may 
also include it in the software of that computer apparatus as one function of that computer apparatus. 
[0064] 

[Effect of the Invention] As explained above, according to this invention, out of the image data which 
read manuscripts, such as a newspaper and a magazine, and was obtained, a specific document block is 
extracted, it is easy to read, and, moreover, the document data which can be stuck can be obtained easily 
efficiently [ there is no futility in the field of fixed form size, and ]. 
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JPO and NCI PI are not responsible for any 
damages caused by the use of this translation. 

1 .This document has been translated by computer. So the translation may not reflect the original 
precisely. 

2.**** shows the word which can not be translated. 
3 . In the drawings, any words are not translated. 



CLAIMS 



[Claim(s)] 

[Claim 1] The image processing system characterized by to have an extract means extract the document 
block with which a predetermined image exists out of the image data which should be processed, a 
recognition means recognize a character code from the alphabetic character image within said document 
block, a reconstruction means reconfigurate said document block in a predetermined configuration, and 
a layout means arrange the character code data according to the character code recognized by said 
recognition means in said reconfigured document block. 

[Claim 2] It is the image processing system according to claim 1 characterized by for said extract means 
extracting two or more document blocks, and packing into one two or more document blocks with which 
said reconstruction means was extracted, and reconfigurating them in a predetermined configuration. 
[Claim 3] Said predetermined image is an image processing system according to claim 1 characterized 
by including the alphabetic character image of the text corresponding to the alphabetic character image 
of a header, and the header concerned. 

[Claim 4] The image processing system according to claim 3 characterized by having further a reference 

letter arrangement means to arrange the character code data corresponding to the alphabetic character 

image of said header to the position within the reconfigurated document block. 

[Claim 5] Said reconstruction means is an image processing system according to claim 1 characterized 

by adjusting the length or form width of a document block to one step of die length twice [ abbreviation 

natural number ] the die length of the column formed in the document block concerned. 

[Claim 6] The image processing system according to claim 1 characterized by having further a file 

creation means to create the electronic file which stored the character code data arranged by said layout 

means. 

[Claim 7] The image processing system according to claim 1 characterized by having further a printing 

means to print the character code data arranged by said layout means to record material. 

[Claim 8] The image processing system according to claim 1 characterized by reading the image of a 

manuscript optically and having further a reading means to obtain said image data which should be 

processed. 

[Claim 9] The image-processing approach characterized by to have the step which extracts the document 
block with which a predetermined image exists out of the image data which should be processed, the 
step which recognizes a character code from the alphabetic character image within said document block, 
the step which reconfigurates said document block in a predetermined configuration, and the step which 
arrange the character code data according to said recognized character code in said reconfigurated 
document block. 

[Claim 10] The image-processing program for considering as a means arrange the character code data 
according to said recognized character code, and making it function in a means extract the document 
block with which a predetermined image exists out of the image data which should process a computer, 
a means recognize a character code from the alphabetic character image within said document block, a 
means reconfigurate said document block in a predetermined configuration, and said reconfigurated 



http://ww4.ipdl.ncipi.go.jp/cgi-b^ 9/14/06 



document block. 

[Claim 1 1] The record medium which recorded the program according to claim 10 and in which 
computer reading is possible. 



[Translation done.] 
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